Notebook 3 Training a classifier¶
Loading the pyecog module¶
The easiest place to place and run this notebook is from the pyecog directory downloaded from github, e.g. “pyecog-Development” as the pyecog module will be found in this folder. However, if you want to run the notebook from else where on your computer you first need to make sure that python can find the pyecog module using sys.path.append(). To do this modify and copy the following code into a cell and run it (shift+enter).
import sys
pyecog_path = 'home/jonathan/git_repos/pyecog' # replace this with the pyecog location
sys.path.append(pyecog_path)
If you are on windows you have to deal with the problem that backslashes in your paths are treated escape characters by python. Prefixing the string with ‘r’ prevents this.
pyecog_path_windows = r'home\jonathan\git_repos\pyecog' # replace this with the pyecog location
In [ ]:
import sys
import os
import pandas as pd
In [ ]:
# if you are in the directory downloaded from github you do not have to run this cell
pyecog_path = '/home/jonathan/git_repos/pyecog' # replace this with the Pyecog-Master location
sys.path.append(pyecog_path)
In [ ]:
import pyecog as pg
pg # check the module is imported from where you expect
In [ ]:
from sklearn.ensemble import RandomForestClassifier
Make and train classifier¶
- First we set the library to use to train the classifier on. Here we will use test30 from the previous notebook.
In [2]:
seizure_lib_path = 'test30'
In [ ]:
clf = pg.Classifier(library_path=seizure_lib_path)
In [ ]:
# check out the data we are using to train classifier
clf.lib.get_dataframe().head()
In [ ]:
# and the number of samples
clf.lib.X.shape
In [ ]:
clf.preprocess_features()
In [ ]:
# note change n_jobs to a different number to not use all cores.
descrim_algo = RandomForestClassifier(n_estimators=800, random_state=7, n_jobs=-1)
clf.algo = pg.ClassificationAlgorithm(descrim_algo, pg.HMM())
In [ ]:
clf.algo.descriminative_model
In [ ]:
clf.algo.hmm
In [ ]:
# We are now ready to train the classifier
In [ ]:
%%time
clf.train()
In [ ]:
# choose where to save the classifier!
clf.save('test_30_clf_trained.p')
In [ ]:
# Here we double check that we can predict something and everything seems to be working
In [ ]:
preds = clf.predict(clf.X, 0.5)
In [ ]:
import sklearn.metrics as sk_metrics
In [ ]:
print(sk_metrics.classification_report(clf.y, preds))